2 Machine-Level Representation of Programs
The command-line option -Og instructs the compiler to apply a level of optimization that yields machine code that follows the overall structure of the original C code.
- program counter
- integer register file
- condition code registers
- vector registers
Example:
// mstore.c
long mult2(long, long);
void multstore(long x, long y, long* dest) {
long t = mult2(x, y);
*dest = t;
}
# generate assembly code
gcc -Og -S mstore.c
# compile and assemble
gcc -Og -c mstore.c
# mstore.s contains the following snippet
multstore:
.LFB0:
.cfi_startproc
endbr64
pushq %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movq %rdx, %rbx
call mult2@PLT
movq %rax, (%rbx)
popq %rbx
.cfi_def_cfa_offset 8
ret
.cfi_endproc
# mstore.o contains the following sequence
53 48 89 d3 e8 00 00 00 00 48 89 03 5b c3
Use disassemblers to inspect the contents of machine-code files:
linux> objdump -d mstore.o
mstore.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <multstore>:
0: f3 0f 1e fa endbr64
4: 53 push %rbx
5: 48 89 d3 mov %rdx,%rbx
8: e8 00 00 00 00 callq d <multstore+0xd>
d: 48 89 03 mov %rax,(%rbx)
10: 5b pop %rbx
11: c3 retq
We can also use gdb
to display the binary object code for a program:
linux> gdb mstore.o
(gdb) x/20xb multstore
0x0 <multstore>: 0xf3 0x0f 0x1e 0xfa 0x53 0x48 0x89 0xd3
0x8 <multstore+8>: 0xe8 0x00 0x00 0x00 0x00 0x48 0x89 0x03
0x10 <multstore+16>: 0x5b 0xc3 Cannot access memory at address 0x12
x/20xb multstore
: display (abbreviated ‘x’) 14 hex-formatted (also ‘x’) bytes (‘b’) starting at the address where function multstore is located
- x86-64 instructions can range in length from 1 to 15 bytes.
Add main.c
:
#include <stdio.h>
void multstore(long, long, long *);
int main() {
long d;
multstore(2, 3, &d);
printf("2 * 3 --> %ld\n", d);
return 0;
}
long mult2(long a, long b) {
long s = a * b;
return s;
}
gcc -Og -o prog main.c mstore.c
objdump -d prog
The outputs contains:
00000000000011d5 <multstore>:
11d5: f3 0f 1e fa endbr64
11d9: 53 push %rbx
11da: 48 89 d3 mov %rdx,%rbx
11dd: e8 e7 ff ff ff callq 11c9 <mult2>
11e2: 48 89 03 mov %rax,(%rbx)
11e5: 5b pop %rbx
11e6: c3 retq
11e7: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
11ee: 00 00
The addresses listed along the left are different— the linker has shifted the location of this code to a different range of addresses.
Combining assembly code with C programs
1. Data Formats
- word: 2 Bytes
- l (long words) for int: 4 Bytes; for double: 8 Bytes
- s for float: 4 Bytes
- q (quad words) for long or char*: 8 Bytes
2. Accessing Info.
x86-64 has a set of 16 general-purpose registers storing 64-bit values. (While x86-32 has only 8.)
%rip
: program counter
2.1 Operand Specifiers
NOTE: \(s = 1,2,4,8\)
2.2 Data Movement Instructions
The instructions in a class perform the same operation but with different operand sizes.
2.2.1 Simple
- A move instruction cannot have both operands refer to memory locations!
- Copying a value from one memory location to another requires two instructions—the first to load the source value into a register, and the second to write this register value to the destination.
- The size of the register must match the size designated by the last character of the instruction!
-
For most cases, the
mov
instructions will only update the specific register bytes or memory locations indicated by the destination operand (with the remaining bytes unchanged).- Except for
movl
(long word is 4 Bytes, half of the 64-bit register), it will also set the high-order 4 bytes of the register to 0.
- Except for
-
The regular
movq
instruction can only have immediate source operands that can be represented as 32-bit two’s-complement numbers. (This value is then sign extended to produce the 64-bit value for the destination.)- The
movabsq
instruction can have an arbitrary 64-bit immediate value as its source operand and can only have a register as a destination. (I → R)
- The
2.2.2 Zero-extending
- size designators as its final two characters—the first specifying the source size, and the second specifying the destination size
2.2.3 Sign-extending
-
Note the absence of an explicit instruction to zero-extend a 4-byte source value to an 8-byte destination in Figure 3.5. Such an instruction would logically be named movzlq, but this instruction does not exist. Instead, this type of data movement can be implemented using a movl instruction having a register as the destination. This technique takes advantage of the property that an instruction generating a 4-byte value with a register as the destination will fill the upper 4 bytes with zeros.
-
cltq
has no operands—it always uses register %eax as its source and %rax as the destination for the sign-extended result. It therefore has the exact same effect as the instruction movslq %eax, %rax, but it has a more compact encoding.
Data Movement Example:
Two features:
- Dereferencing a pointer involves copying that pointer (addr) into a register (
%rdi
), and then using this register in a memory reference ((%rdi)
). - Local variables such as x are often kept in registers rather than stored in memory locations.
2.2.4 Stack Operation
The stack pointer %rsp
holds the address of the top stack element.
pushq %rax
is equivalent to:
subq $8, %rsp # Decrement stack pointer Store
movq %rax, (%rsp) # Store %rax on stack
popq %rax
is equivalent to:
movq (%rsp), %rax # Read %rax from stack
addq $8, (%rsp) # Increment stack pointer
2.3 Arithmetic and Logical Operations
leaq (%rdx), %rax
copies the effective address (but not the value stored in the address) to the destination.- Shift
<shift instruction> <imm value / single-byte register %cl>
(only allowing this specific register%cl
as the operand)- a shift instruction operating on data values that are \(w\) bits long determines the shift amount from the low-order \(m\) bits of register
%cl
, where \(2^m = w\). The higher-order bits are ignored.- So, for example, when register
%cl
has hexadecimal value0xFF
, then instructionsalb
would shift by 7, whilesalw
would shift by 15,sall
would shift by 31, andsalq
would shift by 63.
- So, for example, when register
- Different right shift instructions:
SAR
performs an arithmetic shift (fill with copies of the sign bit)SHR
performs a logical shift (fill with zeros)
Special Arithmetic Operations:
oct word: 8 words, 16-byte
Byte Order
The code above is generated for little-endian machine, on which the high-order (more significant) bytes are stored at higher addresses.
3. Control
3.1 Condition Codes
a set of single-bit condition code registers
- CF: Carry flag. The most recent operation generated a carry out of the most significant bit. Used to detect overflow for unsigned operations.
- ZF: Zero flag. The most recent operation yielded zero.
- SF: Sign flag. The most recent operation yielded a negative value.
- OF: Overflow flag. The most recent operation caused a two’s-complement overflow—either negative or positive.
leaq
instruction does not alter any condition codes, since it is intended to be used in address computations.- Otherwise, all of the instructions listed in Figure 3.10 cause the condition codes to be set.
3.2 Accessing the Condition Codes
e.g. Compute a < b
with data type long
:
cmpq %rsi %rdi
~%rdi - %rsi
~a - b
a < b
~a - b < 0
~SF=1 && OF=0 || SF=0 && OF=1
3.3 Jump Instructions
Different encodings for target instruction of jumps:
- PC relative: the difference between the address of the target instruction and the address of the instruction immediately following the jump
- the instructions can be compactly encoded (requiring just 2 bytes)
- the object code can be shifted to different positions in memory without alteration
- Absolute address: using 4 bytes to directly specify the target
if (test_expr)
then_statement;
else
else_statement;
is equivalent to assembly code written in C:
t = test_expr;
if (!t)
goto false;
then_statement;
goto done;
false:
else_statement;
done:
3.4 Implementing Conditional Branches with Conditional Moves
- The source and destination values can be 16, 32, or 64 bits long. Single-byte conditional moves are not supported.
- The assembler can infer the operand length of a conditional move instruction from the name of the destination register, and so the same instruction name can be used for all operand lengths.
e.g.
3.5 Loop
Loop gets turned into a lower-level combination of tests and conditional jumps.
3.5.1 do-while
loop:
body_statement;
t = test_expr;
if (t)
goto loop;
3.5.2 while
the 1st way: jump to middle (-Og
)
goto test;
loop:
body_statement;
test:
t = test_expr;
if (t)
goto loop;
the 2nd way: guarded do (-O1
)
t = test_expr;
if (!t)
goto done;
do
body_statement;
while (test_expr);
done:
Using this implementation strategy, the compiler can often optimize the initial test, for example, determining that the test condition will always hold.
3.5.3 for
for (init-expr; test-expr; update-expr)
body-statement;
// is equivalent to
init-expr;
while (test-expr) {
body-statement;
update-expr;
}
3.6 switch
jump table jt
The use of a jump table allows a very efficient way to implement a multiway branch!
The authors of gcc created a new operator &&
to create a pointer for a code location.
void switch_eg(long x, long n,
long *dest)
{
long val = x;
switch (n)
{
case 100:
val *= 13;
break;
case 102:
val += 10;
/* Fall through */
case 103:
val += 11;
break;
case 104:
case 106:
val *= val;
break;
default:
val = 0;
}
*dest = val;
}
// is translated to
void switch_eg_impl(long x, long n,
long *dest)
{
/* Table of code pointers */
static void *jt[7] = {
&&loc_A, &&loc_def, &&loc_B,
&&loc_C, &&loc_D, &&loc_def,
&&loc_D};
unsigned long index = n - 100;
long val;
if (index > 6)
goto loc_def;
/* Multiway branch */
goto *jt[index];
loc_A: /* Case 100 */
val = x * 13;
goto done;
loc_B: /* Case 102 */
x = x + 10;
/* Fall through */
loc_C: /* Case 103 */
val = x + 11;
goto done;
loc_D: /* Cases 104, 106 */
val = x * x;
goto done;
loc_def: /* Default case */
val = 0;
done:
*dest = val;
}
Declarations of jump table:
4. Procedures
Features:
- x86-64 stack grows toward lower addresses
- stack pointer
%rsp
points to the top element of the stack - stack frame structure
- When procedure P calls procedure Q, it will push the return address onto the stack, indicating where within P the program should resume execution once Q returns
4.1 Control Transfer
call:
- pushes an address A (return address) onto the stack and sets the PC to the beginning of Q
- return address is computed as the address of the instruction immediately following the call instruction
ret:
- pops an address A off the stack and sets the PC to A
4.2 Data Transfer
When passing parameters on the stack, all data sizes are rounded up to be multiples of eight.
With the arguments in place, the program can then execute a call instruction to transfer control to procedure Q.
Except the first 6 args, others are allocated within the stack frame (with higher addresses than the stack pointer %rsp
).
4.3 Local Storage on the Stack
At times, however, local data must be stored in memory:
- There are not enough registers to hold
- The address operator
&
is applied to a local variable (must generate an address) - arrays or structures
A procedure allocates space on the stack frame by decrementing the stack pointer.
4.4 Local Storage in Registers
By convention, registers %rbx
, %rbp
, and %r12
–%r15
are classified as callee-saved registers. When procedure P calls procedure Q, Q must preserve the values of these registers.
All other registers, except for the stack pointer %rsp
, are classified as caller-saved registers. (They can be modified by any function!)
push
and pop
decrement and increment the stack pointer, respectively.
5. Array Allocation and Access
5.1 Pointer Arithmetic
\(p\) is a pointer to data of type T , and the value of \(p\) is \(x_p\), then the expression \(p+i\) has value \(x_p + L \cdot i\), where \(L\) is the size of data type T.
The array reference A[i]
is identical to the expression *(A+i)
.
5.2 Nested Arrays
5.3 Array Access Optimization
Fix-sized:
Variable-sized:
5.4 Variable-Size Stack Frames
Variable-sized arrays needs the support of variable-size stack frames.
6. Heterogeneous Data Structures
All of the components of a structure are stored in a contiguous region of memory and a pointer to a structure is the address of its first byte.
The selection of the different fields of a structure is handled completely at compile time. (The machine code contains no information about the field declarations or the names of the fields.)
6.1 Data Alignment
The x86-64 hardware will work correctly regardless of the alignment of data. However, Intel recommends that data be aligned to improve memory system performance.
- Any primitive object of K bytes must have an address that is a multiple of K.
- For struct:
- Each structure element should satisfy its alignment requirement.
- Any pointer of the struct satisfies a K-byte alignment. (K is the maximum size of its members.) (The total size of the struct should be a multiple of K.)
7. Combining Control and Data in Machine-Level Programs
The value of a function pointer is the address of the first instruction in the machine-code representation of the function.
7.1 Buffer Overflow
Thwart:
-
Stack randomization (a kind of ASLR, address-space layout randomization)
-
allocating a random amount of space between 0 and n bytes on the stack at the
start of a program
-
-
Stack protector
- Store a special canary value (guard value) in the stack frame between any local buffer and the rest of the stack state.
- Before restoring the register state and returning from the function, the program checks if the canary has been altered. (compare with the initial value)
-fno-stack-protector
prevent gcc from inserting stack protector.
Attack:
- nop (no operation) sequence (nop sled, i.e. slides through the sequence)